Search Results

Documents authored by Valiant, Paul


Document
Optimal Sub-Gaussian Mean Estimation in Very High Dimensions

Authors: Jasper C.H. Lee and Paul Valiant

Published in: LIPIcs, Volume 215, 13th Innovations in Theoretical Computer Science Conference (ITCS 2022)


Abstract
We address the problem of mean estimation in very high dimensions, in the high probability regime parameterized by failure probability δ. For a distribution with covariance Σ, let its "effective dimension" be d_eff = {Tr(Σ)}/{λ_{max}(Σ)}. For the regime where d_eff = ω(log^2 (1/δ)), we show the first algorithm whose sample complexity is optimal to within 1+o(1) factor. The algorithm has a surprisingly simple structure: 1) re-center the samples using a known sub-Gaussian estimator, 2) carefully choose an easy-to-compute positive integer t and then remove the t samples farthest from the origin and 3) return the sample mean of the remaining samples. The core of the analysis relies on a novel vector Bernstein-type tail bound, showing that under general conditions, the sample mean of a bounded high-dimensional distribution is highly concentrated around a spherical shell.

Cite as

Jasper C.H. Lee and Paul Valiant. Optimal Sub-Gaussian Mean Estimation in Very High Dimensions. In 13th Innovations in Theoretical Computer Science Conference (ITCS 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 215, pp. 98:1-98:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{lee_et_al:LIPIcs.ITCS.2022.98,
  author =	{Lee, Jasper C.H. and Valiant, Paul},
  title =	{{Optimal Sub-Gaussian Mean Estimation in Very High Dimensions}},
  booktitle =	{13th Innovations in Theoretical Computer Science Conference (ITCS 2022)},
  pages =	{98:1--98:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-217-4},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{215},
  editor =	{Braverman, Mark},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2022.98},
  URN =		{urn:nbn:de:0030-drops-156942},
  doi =		{10.4230/LIPIcs.ITCS.2022.98},
  annote =	{Keywords: High-dimensional mean estimation}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail